我们呈现Point-Bert,一种用于学习变压器的新范式,以概括BERT对3D点云的概念。灵感来自BERT,我们将屏蔽点建模(MPM)任务设计为预列火车点云变压器。具体地,我们首先将点云划分为几个本地点修补程序,并且具有离散变化性AutoEncoder(DVAE)的点云标记器被设计为生成包含有意义的本地信息的离散点令牌。然后,我们随机掩盖了一些输入点云的补丁并将它们送入骨干变压器。预训练目标是在销售器获得的点代币的监督下恢复蒙面地点的原始点令牌。广泛的实验表明,拟议的BERT风格的预训练策略显着提高了标准点云变压器的性能。配备了我们的预培训策略,我们表明,纯变压器架构对ModelNet40的准确性为93.8%,在ScanObjectnn的最艰难的设置上的准确性为83.1%,超越精心设计的点云模型,手工制作的设计更少。我们还证明,Point-Bert从新的任务和域中获悉的表示,我们的模型在很大程度上推动了几个射击点云分类任务的最先进。代码和预先训练的型号可在https://github.com/lulutang0608/pint -bert上获得
translated by 谷歌翻译
联合学习(FL)使移动边缘计算(MEC)中的设备能够在不上载本地数据的情况下协作培训共享模型。可以应用梯度压缩来缓解通信开销,但随着梯度压缩的流动仍然面临着巨大的挑战。为了部署绿色MEC,我们提出了Fedgreen,它通过细粒度梯度压缩增强了原始流体,以有效控制设备的总能耗。具体地,我们介绍了相关的操作,包括设备侧梯度减少和服务器侧元素 - 明智的聚合,以便于FL中的梯度压缩。根据公共数据集,我们研究了压缩的本地梯度对不同压缩比的贡献。之后,我们制定和解决学习精度 - 能效概率问题,其中为每个设备导出最佳压缩比和计算频率。实验结果表明,与基线方案相比,鉴于80%的测试精度要求,FedGreen减少了装置总能耗的至少32%。
translated by 谷歌翻译
如今,大规模数据集的大型培训大型模型已成为深度学习的关键主题。具有较高表示能力和可传递性的预训练模型取得了巨大的成功,并在自然语言处理和2D视觉中占据了许多下游任务。但是,鉴于有限的训练数据相对不便,因此将这种预处理的调整范式促进这种预处理的调整范式是非平凡的。在本文中,我们提供了一个新的观点,即利用3D域中的预训练的2D知识来解决此问题,以新颖的点对像素来调整预训练的图像模型,以较小的参数成本提示点云分析。遵循促使工程的原理,我们将点云转换为具有几何形状的投影和几何学吸引着色的色彩图像,以适应预训练的图像模型,在点云分析的端到端优化期间,其权重冻结了任务。我们进行了广泛的实验,以证明与提议的点对像素提示合作,更好的预训练图像模型将导致在3D视觉中始终如一地表现更好的性能。享受图像预训练领域的繁荣发展,我们的方法在Scanobjectnn的最困难环境中获得了89.3%的精度,超过了传统的点云模型,具有较少的可训练参数。我们的框架在模型网分类和塑形部分分割方面还表现出非常具竞争力的性能。代码可从https://github.com/wangzy22/p2p获得
translated by 谷歌翻译
最近,基于骨架的动作识别已经取得了快速进步和卓越的性能。在本文中,我们在跨数据集设置下调查了这个问题,这是现实情况下的新,务实且具有挑战性的任务。遵循无监督的域适应(UDA)范式,该动作标签仅在源数据集上可用,但在训练阶段的目标数据集中无法使用。与UDA的常规基于对抗性学习的方法不同,我们利用一个自学计划来减少两个基于骨架的动作数据集之间的域移动。我们的灵感来自Compism,Compism是20世纪初期的艺术类型,它破坏并重新组装了物体以传达更大的背景。通过分割和定制时间段或人体部位,我们设计了两个自制的学习分类任务,以探索基于骨架的动作的时间和空间依赖性,并提高模型的概括能力。我们在六个基于骨架的动作识别的数据集上进行实验,包括三个大规模数据集(NTU RGB+D,PKU-MMD和动力学),在其中建立了新的跨数据库设置和基准。广泛的结果表明,我们的方法优于最先进的方法。我们的模型和所有比较方法的源代码均可在https://github.com/shanice-l/st-cubism上获得。
translated by 谷歌翻译
基于细粒的草图的图像检索(FG-SBIR)旨在找到来自给定查询草图的大型画廊的特定图像。尽管FG-SBIR在许多关键域中进行了广泛适用性(例如,犯罪活动跟踪),但现有的方法仍然遭受低精度,同时对外部噪声敏感,例如草图中不必要的笔画。在更实际的在飞行环境下,检索性能将进一步恶化,其中仅具有少数(噪声)笔划的部分完整的草图可用于检索相应的图像。我们提出了一种新颖的框架,利用了一个独特设计的深度加强学习模型,该模型执行双层探索,以处理部分素描训练和注意区域选择。通过对模型的注意力对原始草图的重要地区实施,对不必要的行程噪声仍然坚固,并通过大边距提高检索准确性。为了充分探索部分草图并找到要参加的重要区域,该模型在调整控制本地探索的定位器网络的标准偏差项时,该模型对全局探索进行引导策略梯度。培训过程是由混合损失引导的,融合了强化损失和监督损失。开发了一种动态排名奖励,以使用部分草图来适应随机图像检索过程。在三个公共数据集上执行的广泛实验表明,我们的建议方法在部分草图基于图像检索上实现了最先进的性能。
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.
translated by 谷歌翻译
Digital engineering transformation is a crucial process for the engineering paradigm shifts in the fourth industrial revolution (4IR), and artificial intelligence (AI) is a critical enabling technology in digital engineering transformation. This article discusses the following research questions: What are the fundamental changes in the 4IR? More specifically, what are the fundamental changes in engineering? What is digital engineering? What are the main uncertainties there? What is trustworthy AI? Why is it important today? What are emerging engineering paradigm shifts in the 4IR? What is the relationship between the data-intensive paradigm and digital engineering transformation? What should we do for digitalization? From investigating the pattern of industrial revolutions, this article argues that ubiquitous machine intelligence (uMI) is the defining power brought by the 4IR. Digitalization is a condition to leverage ubiquitous machine intelligence. Digital engineering transformation towards Industry 4.0 has three essential building blocks: digitalization of engineering, leveraging ubiquitous machine intelligence, and building digital trust and security. The engineering design community at large is facing an excellent opportunity to bring the new capabilities of ubiquitous machine intelligence and trustworthy AI principles, as well as digital trust, together in various engineering systems design to ensure the trustworthiness of systems in Industry 4.0.
translated by 谷歌翻译